7 research outputs found

    Design and analysis of scalable rule induction systems

    Get PDF
    Machine learning has been studied intensively during the past two decades. One motivation has been the desire to automate the process of knowledge acquisition during the construction of expert systems. The recent emergence of data mining as a major application for machine learning algorithms has led to the need for algorithms that can handle very large data sets. In real data mining applications, data sets with millions of training examples, thousands of attributes and hundreds of classes are common. Designing learning algorithms appropriate for such applications has thus become an important research problem. A great deal of research in machine learning has focused on classification learning. Among the various machine learning approaches developed for classification, rule induction is of particular interest for data mining because it generates models in the form of IF-THEN rules which are more expressive and easier for humans to comprehend. One weakness with rule induction algorithms is that they often scale relatively poorly with large data sets, especially on noisy data. The work reported in this thesis aims to design and develop scalable rule induction algorithms that can process large data sets efficiently while building from them the best possible models. There are two main approaches for rule induction, represented respectively by CN2 and the AQ family of algorithms. These approaches vary in the search strategy employed for examining the space of possible rules, each of which has its own advantages and disadvantages. The first part of this thesis introduces a new rule induction algorithm for learning classification rules, which broadly follows the approach of algorithms represented by CN2. The algorithm presents a new search method which employs several novel search-space pruning rules and rule-evaluation techniques. This results in a highly efficient algorithm with improved induction performance. Real-world data do not only contain nominal attributes but also continuous attributes. The ability to handle continuously valued data is thus crucial to the success of any general purpose learning algorithm. Most current discretisation approaches are developed as pre- processes for learning algorithms. The second part of this thesis proposes a new approach which discretises continuous-valued attributes during the learning process. Incorporating discretisation into the learning process has the advantage of taking into account the bias inherent in the learning system as well as the interactions between the different attributes. This in turn leads to improved performance. Overfitting the training data is a major problem in machine learning, particularly when noise is present. Overfitting increases learning time and reduces both the accuracy and the comprehensibility of the generated rules, making learning from large data sets more difficult. Pruning is a technique widely used for addressing such problems and consequently forms an essential component of practical learning algorithms. The third part of this thesis presents three new pruning techniques for rule induction based on the Minimum Description Length (MDL) principle. The result is an effective learning algorithm that not only produces an accurate and compact rule set, but also significantly accelerates the learning process. RULES-3 Plus is a simple rule induction algorithm developed at the author's laboratory which follows a similar approach to the AQ family of algorithms. Despite having been successfully applied to many learning problems, it has some drawbacks which adversely affect its performance. The fourth part of this thesis reports on an attempt to overcome these drawbacks by utilising the ideas presented in the first three parts of the thesis. A new version of RULES-3 Plus is reported that is a general and efficient algorithm with a wide range of potential applications

    Evaluation of Climate Change Impacts on the Global Distribution of the Calliphorid Fly <i>Chrysomya albiceps</i> Using GIS

    No full text
    Climate change is expected to influence the geographic distribution of many taxa, including insects. Chrysomya albiceps is one of the most pervasive calliphorid fly with apparent ecological, forensic, and medical importance. However, the global habitat suitability is varied due to climate change. Models that forecast species spatial distribution are increasingly being used in wildlife management, highlighting the need for trustworthy techniques to assess their accuracy. So, we used the maximum entropy implemented in Maxent to predict the current and future potential global geographic distribution of C. albiceps and algorithms of DIVA-GIS to confirm the predicted current model. The Maxent model was calibrated using 2177 occurrence records. Based on the Jackknife test, four bioclimatic variables along with altitude were used to develop the final models. For future models, two representative concentration pathways (RCPs), 2.6 and 8.5, for 2050 and 2070 were used. The area under curve (AUC) and true skill statistics (TSS) were used to evaluate the resulted models with values equal to 0.92 (±0.001) and 0.7, respectively. Two-dimensional niche analysis illustrated that the insect can adapt to low and high temperatures (9 °C to 27 °C), and the precipitation range was 0 mm to 2500 mm. The resulted models illustrated the global distribution of C. albiceps with alteration to its distribution in the future, especially on the Mediterranean coasts of Europe and Africa, Florida in the USA, and the coasts of Australia. Such predicted shifts put decision makers against their responsibilities to prevent destruction in economic, medical, and ecological sectors

    Health Hazards Associated with Wheat and Gluten Consumption in Susceptible Individuals and Status of Research on Dietary Therapies

    No full text
    Wheat accounts for about 20% to over 50% of the total calorie intake of food in regions where it is grown. However, there is a clear perception that disorders related to the consumption wheat are increasing, particularly in Western Europe, North America, and Australia. We consider here the evidence for this perception and discuss strategies and therapies that may be used to reduce the adverse impacts of wheat on the health of susceptible individuals. First, we will introduce the major groups of wheat grain proteins, focusing on those associated with adverse reactions, and discuss in detail the three major adverse reactions triggered by wheat consumption, namely celiac disease, wheat allergy, and non-celiac gluten/wheat sensitivity. Finally, will discuss other issues associated with the consumption of gluten-free foods focusing on gluten contamination of products purported to be gluten-free, gluten threshold or tolerance among celiac patients, and food labeling

    SARS-CoV-2 vaccination modelling for safe surgery to save lives: data from an international prospective cohort study

    No full text
    Background: Preoperative SARS-CoV-2 vaccination could support safer elective surgery. Vaccine numbers are limited so this study aimed to inform their prioritization by modelling. Methods: The primary outcome was the number needed to vaccinate (NNV) to prevent one COVID-19-related death in 1 year. NNVs were based on postoperative SARS-CoV-2 rates and mortality in an international cohort study (surgical patients), and community SARS-CoV-2 incidence and case fatality data (general population). NNV estimates were stratified by age (18-49, 50-69, 70 or more years) and type of surgery. Best- and worst-case scenarios were used to describe uncertainty. Results: NNVs were more favourable in surgical patients than the general population. The most favourable NNVs were in patients aged 70 years or more needing cancer surgery (351; best case 196, worst case 816) or non-cancer surgery (733; best case 407, worst case 1664). Both exceeded the NNV in the general population (1840; best case 1196, worst case 3066). NNVs for surgical patients remained favourable at a range of SARS-CoV-2 incidence rates in sensitivity analysis modelling. Globally, prioritizing preoperative vaccination of patients needing elective surgery ahead of the general population could prevent an additional 58 687 (best case 115 007, worst case 20 177) COVID-19-related deaths in 1 year. Conclusion: As global roll out of SARS-CoV-2 vaccination proceeds, patients needing elective surgery should be prioritized ahead of the general population
    corecore